Visualizing textual models with in-text and word-as-pixel highlighting 0.2in [width=7in]ldagraphic/ldagraphic.pdf figureA topic model's token-level posterior memberships P(zt|wt) shown as in-text annotation (§3) and word-as-pixel (§4) views, from a corpus of U.S. presidential State of the Union speeches. Speeches are concatenated, running in columns; top-left is 1946, bottom right is 2007. (This version shows a sample of tokens.) Demo: ` `%%%`#`&12_`__~~~rue

نویسندگان

  • Abram Handler
  • Su Lin Blodgett
  • Brendan O'Connor
چکیده

We explore two techniques which use color to make sense of statistical text models. One method uses in-text annotations to illustrate a model’s view of particular tokens in particular documents. Another uses a high-level, “wordsas-pixels” graphic to display an entire corpus. Together, these methods offer both zoomed-in and zoomed-out perspectives into a model’s understanding of text. We show how these interconnected methods help diagnose a classifier’s poor performance on Twitter slang, and make sense of a topic model on historical political texts. 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY, USA. Copyright by the author(s).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Material Development and English for Academic Purposes Word Lists; a Reductionist Approach

Nagy (1988) states that vocabulary is a prerequisite factor in comprehension. Drawing upon a reductionist approach and having in mind the prospects for material development, this study aimed at creating an English for Academic Purposes Word List (EAPWL). The corpus of this study was compiled from a corpus containing 6479 pages of texts, 2,081,678 million tokens (running words) and 63825 types (...

متن کامل

PAYMA: A Tagged Corpus of Persian Named Entities

The goal in the named entity recognition task is to classify proper nouns of a piece of text into classes such as person, location, and organization. Named entity recognition is an important preprocessing step in many natural language processing tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art...

متن کامل

INDUCING VALUABLE RULES FROM IMBALANCED DATA: THE CASE OF AN IRANIAN BANK EXPORT LOANS

<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: -webkit-left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ba...

متن کامل

INDUCING VALUABLE RULES FROM IMBALANCED DATA: THE CASE OF AN IRANIAN BANK EXPORT LOANS

<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: -webkit-left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ba...

متن کامل

استفاده از مدل جاذبه برای استخراج انحنای مرز دریاچه سد

Introduction The attraction model algorithm spatially depends on the neighborhoods of the central pixels that are attracting surrounding sub-pixels. Another possibility is the hypothesis of subpixel interaction as introduced by Mertens et al. (2003) and Atkinson (2005). In order to reach a pixel state with the maximum number of sub-pixels of identical classes neighboring, there are several met...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016